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A method of synchronizing at least two computing 
elements (CEl, CE2) that each have clocks that operate 
asynchronously of the clocks of the other computing ele- 
ments includes selecting one or more signals, designated 
as meta time signals, finom a set of signals produced by 
the computing elements (CEl, CE2), monitoring the com- 
puting elements (CEl, CE2) to detect the production of a 
selected signal by one of the computing elements (CEl), 
waiting for the other computing elements (CE2) to pro- 
duce a selected signal, transmitting equaUy valued time 
updates to each of the computing elements, and updating 
the clocks of the computmg elements (CEl, CE2) based 
on the time updates. In a second aspect of the mvention, 
fault resilient, or tolerant, computers (200) are produced 
by designating a first processor as a computing element 
(204), designating a second processor (202) as a controller, 
connecting the computing element (204) and the controller 
(202) to produce a modular pair, and connecting at least 
two modular pairs to produce a fault resilient or fauh tol- 
erant computer (200). Each computing element (202, 204) 
of the computer (200) performs all instructions in the same 
number of cycles as the other computing elements (202, 
204). The computer systems mclude one or more con- 
trollers (202) and at least two computing elements (204). 
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FAULT RESTLTENT/F AULT TOLERA NT COMPUTING 
Background of the Invention 
5 The invention relates to fault resilient and fault 

tolerant computing methods and apparatus. 

Fault resilient computer systems can continue to 
function in the presence of hardware failures. These 
systems operate in either an availability mode or an 
10 integrity mode, but not both. A system is "available" 
when a hardware failure does not cause unacceptable 
delays in user access, and a system operating in an 
availability mode is configured to remain online, if 
possible, when faced with a hardware error. A system has 
15 data integrity when a hardware failure causes no data 
loss or corruption, and a system operating in an 
integrity mode is configured to avoid data loss or 
corruption, even if it must go offline to do so. 

Fault tolerant systems stress both availability 
20 and integrity. A fault tolerant system remains available 
and retains data integrity when faced with a single 
hardware failure, and, under some circumstances, with 
multiple hardware failures. 

Disaster tolerant systems go one step beyond fault 
25 tolerant systems and require that loss of a computing 
site due to a natural or man-made disaster will not 
interrupt system availability or corrupt or lose data. 

Prior approaches to fault tolerance include 
software qheckpoint/restart, triple modular rediindancy, 
30 and pair and spare. 

Checkpoint /restart systems employ two or more 
computing elements that operate asynchronously and may 
execute different applications. Each application 
periodically stores an image of the state of the 
35 computing element on which it is running (a checkpoint) . 
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When a fault in a coioputing element is detected , the 
checkpoint is used to restart the application on another 
computing element (or on the same computing element once 
the fault is corrected) • To implement a 
5 chec)cpoint/restart system, each of the applications 

and/ or the operating system to be run on the system must 
be modified to periodically store the image of the 
system. In addition, the system must be capable of 
"•backtracking" (that is, undoing the effects of any 

10 operations that occurred subsequent to a checkpoint that 
is being restarted) • 

With triple modular redundancy, three computing 
elements run the same application and are operated in 
cycle-by-cycle lockstep. All of the computing elements 

15 are connected to a block of voting logic that compares 
the outputs (that is, the memory interfaces) of the three 
computing elements and, if all of the outputs are the 
same, continues with normal operation. If one of the 
outputs is different, the voting logic shuts down the 

20 computing element that has produced the differing output. 
The voting logic, which is located between the confuting 
elements and memoiry, has a significcint impact on system 
speed. 

Pair and spare systems include two or more pairs 
25 of computing elements that run the same application and 
are operated in cycle-by-cycle lockstep. A controller 
monitors the outputs (that is, the memory interfaces) of 
each computing element in a pair. If the outputs differ, 
both computing elements in the pair are shut down. 
30 ^giiTmwar y of thB Invention 

According to the invention, a fault resilient 
and/or fault tolerant system is obtained through use of 
at least two computing elements ("CEs") that operate 
asynchronously in real time (that is, from cycle to 
35 cycle) and synchronously in so-called "meta time." The 



wo 95/15529 PCT/US94/13350 



CEs are synchronized at meta times that occur often 
enough so that the applications running on the CEs do not 
diverge, but are allowed to run asynchronously between 
the meta times. For example, the CEs could be 
5 synchronized once each second and otherwise run 

asynchronously. Because the CEs are resynchronized at 
each meta time, the CEs are said to be operating in meta 
time lockstep. 

In particular embodiments, meta times are defined 

10 as the times at which the CEs request I/O operations. In 
these embodiments, the CEs are synchronized after each 
I/O operation and run asynchronously between I/O 
operations. This approach is applicable to systems in 
which at least two asynchronous computing elements 

15 running identical applications always generate I/O 

requests in the same order. This approach can be further 
limited to resynchronization after only those I/O 
requests that modify the processing environment (that is, 
write requests) . 

20 Meta time synchronization according to the 

invention is achieved through use of a paired modular 
redundant architecture that is transpsurent to 
applications and operating system software. According to 
this architecture, each CE is paired with a controller, 

25 otherwise known as an I/O processor ("lOP") . The lOPs 
perform any I/O operations requested by or directed to 
the CEs, detect heurdware faults, and synchronize the CEs 
with each other after each I/O operation. In systems in 
which I/O requests are not issued with sufficient 

30 frequency, the lOPs periodically synchronize the CEs in 
response to so-called "^quantum interrupts** generated by 
inter-processor interconnect (IPI) modules connected to 
the CEs. 

In another particular embodiment of the invention, 
35 rather than synchronizing the CEs based on each 
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particular I/O operation, the CEs are synchronized based 
on a window of I/O operations. In this approach, a list 
of I/O operations is maintained for each CE and the CEs 
are synchronized whenever a connnon entry appears in all 
5 of the lists. This approach allows flexibility as to the 
order in which I/O requests are generated. 

In yet another exemplary embodiment of the 
invention, the CEs are synchronized based either on 
signals that are periodically generated by the operating 
10 system or on heurdware generated interrupts. For example, 
in the hardware interrupt approach, a processor of each 
CE is modified to generate an interrupt every N cycles 
and the CEs are synchronized in response to those 
interrupts. 

15 Primary components of a paired modular redundant 

system include software, off-the-shelf lOPs, off-the- 
shelf CEs, and pairs of customized IFI modules that plug 
into expansion slots of the lOP and the CE and are 
interconnected by a cable. Redundant I/O devices can be 

20 connected to one or more of the CEs or lOPs to provide 
redundant I/O and offer features such as volume shadowing 
of key mass storage devices. A paired modular redundant 
system can accommodate any I/O device that is compatible 
with a processor used in implementing an lOP of the 

25 system. 

The paired modular redundant £a:chitecture uses 
minimal custom software and hardware to enable at least 
two off-the-shelf computing elements to be combined into 
a fault resilient or tolerant system that runs industry 

30 standard operating systems, such as Windows NT, DOS, ^ 
OS/2, or UNIX, and unmodified applications. Thus, the 
architecture can avoid both the high costs and 
inflexibility of the proprietary operating systems, 
applications, and processor designs used in the prior 

35 art. 
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Another advantage of the paired nodular redundant 
architecture of the present invention is that it offers a 
certain degree of software fault tolerance. The majority 
of software errors are not algorithmic. Instead, most 
5 errors are caused by asynchrony between the computing 
element and I/O devices that results in I/O race 
conditions. By decoupling I/O requests from the 
computing elements, the paired modular redundant 
architecture should substantially reduce the number of 

10 so-called "Heisenbug'* software errors that result from 
such asynchrony. 

In one aspect, generally, the invention features 
forming a fault tolerant or fault resilient computer by 
using at least one controller to synchronize at least two 

15 computing elements that each have clocks operating 
asynchronously of the clocks of the other computing 
elements. One or more signals, designated as meta time 
signals, are selected from a set of signals produced by 
the computing elements. Thereafter, the computing 

20 elements are monitored to detect the production of 

selected signals by one of the computing elements. Once 
a selected signal is detected, the system waits for the 
production of selected signals by the other computing 
elements, and, upon receiving the selected signals, 

25 transmits equal time updates to each of the computing 
elements. The clocks of the computing elements are then 
updated based on the time updates. 

Preferred embodiments of the invention include the 
features listed below. First, I/O requests are the 

30 selected signals. The I/O requests are processed to 

produce I/O responses that are transmitted with the time 
updates. In addition to, or instead of, I/O requests, 
quantum interrupts can be the selected signals. The 
computing elements count either executed instructions or 

35 the cycles of a clock such as the system clock, bus 
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clock, or I/O clock, and generate quantum interrupts 
whenever a predefined number of instructions or cycles 
occurs. When both I/O requests and quantum interrupts 
are used as the selected signals, the computing elements 
5 count the number of instructions or cycles that occur 
without an I/O request. For example, a computing element 
could be programmed to generate a qusmtum interrupt 
whenever it processes for one hundred cycles without 
generating am I/O request, 

10 In one embodiment, instructions are counted by 

loading a coxinter with a predetermined value, enabling 
the counter with an I/O request, decrementing the value 
of the counter, and signalling a quantum interrupt when 
the value of the counter reaches zero. In another 

15 approach, debugging features of the processor are used to 
generate the quanttim interrupts. 

For fault detection, the selected signals and 
accompanying data, if any, from each of the computing 
elements are compared. If they do not match, a signal is 

20 generated to indicate that a fault has occiirred. 

In some embodiments, the computing elements wait 
for time updates by pausing operation after producing the 
selected signals. The computing elements resume 
operation upon receipt of the time updates. In other 

25 embodiments, the computing elements continue operation 
after producing the selected signals. 

To avoid problems that can be caused by 
asynchronous activities of the computing elements, the 
asynchronous activities are disabled. The functions of 

30 the asynchronous activities are then performed when a 
selected signal is produced. For example, normal memory 
refresh functions btb disad^led and, in their place, burst 
memory refreshes are performed each time that a selected 
signal, such as an I/O request or a quantum interrupt, is 

35 produced. 
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The invention also features a method of producing 
fault resilient or fault tolerant computers by 
designating a first processor as a computing element, 
designating a second processor as a controller, and 
5 connecting the computing element and the controller to 
produce a modular pair. Thereafter, at least two modular 
pairs are connected to produce a fault resilient or fault 
tolerant computer. The processors used for the computing 
elements need not be identical to each other, but 

10 preferably they all perform each instruction of .their 
instruction sets in the same number of cycles as are 
taken by the other processors. Typically, industry 
standard processors are used in implementing the 
computing elements and the controllers. For disaster 

15 tolerance, at least one of the modular pairs can be 
located remotely from the other modular pairs. The 
controllers and computing elements are each able to run 
unmodified industry standard operating systems and 
applications. In addition, the controllers are able to 

20 run a first operating system while the computing elements 
simultaneously run a second operating system. 

I/O fault resilience is obtained by connecting 
redundant I/O devices to at least two modular pairs and 
transmitting at least identical I/O write requests and 

25 data to the redundant I/O devices. While I/O read 
recpiests need only be transmitted to one of the I/O 
devices, identical I/O read requests may be transmitted 
to more than one of the I/O devices to verify data 
integrity. When redundant I/O devices cure connected to 

30 three or more modular pairs, transmission of identical 
I/O requests allows identification of a faulty I/O 
device. 

In another aspect, generally, the invention 
features isolating I/O requests from computing operations 
35 in a computer through use of I/O redirection. Typically, 
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I/O devices are accessed either through low level I/O 
requests or by directly addressing the I/O devices. Low 
level I/O requests include requests to the system's basic 
input output system ( e.g. , BIOS) , boot firmware requests, 
5 boot software requests, and requests to the system's 
physical device driver software. When a computing 
element issues a low level I/O request, the invention 
features using software to redirect the I/O requests to 
an I/O processor. When the computing element directly 

10 addresses the physical I/O devices, the invention 

features providing virtual I/O devices that simulate the 
interfaces of physical I/O devices. Directly addressed 
I/O requests are intercepted and provided to the virtual 
I/O devices. Periodically, the contents of the virtual 

15 I/O devices are transmitted to the I/O processor (s) as 
I/O requests. At the I/O processor (s) , the transmitted 
contents of the virtual I/O devices are provided to the 
physical I/O devices. After the requested I/O operations 
are performed, the results of the operations, if any, are 

20 returned to the computing elements as responses to the 
I/O requests. Typically, the virtual I/O devices include 
a virtual keyboard and a virtual display. 

The invention also features detecting and 
diagnosing faults in a computer system that includes at 

25 least two controllers that are connected to each other 
and to at least two computing elements, and at least two 
computing elements that are each connected to at least 
two of the controllers. Each computing element produces 
data and generates a value, such as an error checking 

30 code, that relates to the data. Each computing element 
then transmits the data, along with its corresponding 
value, to the at least two controllers to which it is 
connected. When the controllers receive the data smd 
associated values, they transmit the values to the other 

35 controllers. Each controller then performs computations 
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on the values corresponding to each computing element and 
the values corresponding to each controller. If the 
results of the computations on the values corresponding 
to each controller are equal, and the results of the 
5 computations on the values corresponding to each 
computing element are equal, then no fault exists. 
Otherwise, a fault exists. In some instances, the 
computation may be a simple bit by bit compeurison. 

mien a fault exists, fault diagnosis is attempted 

10 by comparing, for each one of the computing element, all 
of the values corresponding to the one computing element. 
If the values corresponding to each computing element 
match for each computing element, but mismatch for 
different computing elements, then one of the computing 

15 elements is faulty. If the values corresponding to only 
one of the computing elements mismatch, then a path to 
that computing element is faulty. If the values 
corresponding to multiple computing elements mismatch, 
then the controller that is connected to the mismatching 

20 computing elements is faulty. Once identified, the 
faulty element is disabled. 

A system according to the invention can restore 
itself to full capability after a faulty element (that 
is, a CE, an lOP, a storage device, etc.) is repaired. 

25 The system does so by transferring the state of an active 
element to the repaired element and, thereafter, 
reenabling the repaired element. ^Inactive or repaired 
processors are activated by transferring the operational 
state of an active processor or processors to the 

30 inactive processor through a controller. When the 
inactive processor is a computing element, the 
operational state of an active computing element (or 
elements) is transferred through a controller. When the 
inactive processor is a controller, the operating state 

35 of an active controller is directly transferred. The 
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transfer can occxir either when system operation is paused 
or as a backgroimd process. 

This recovery capability can also be used to 
provide on-line upgrades of hardware^ software, or both 
5 by causing a processor of the system to fail by, for 
example, turning it off. The upgrade is then performed 
by either replacing or modifying the disabled processor. 
The upgraded processor is then turned on and reactivated 
as discussed above. 

10 The invention also features a single controller, 

dual computing element system in which a controller is 
connected to two cosqputing elements. In this computer 
system, I/O operations by the computing elements are 
intercepted and redirected to the controller. Typically, 

15 the controller and the two computing elements each 

include an industry standard motherboeird, and are each 
able to run unmodified industry standard operating 
systems and applications. In addition, the controller is 
able to run a first operating system while the computing 

20 elements simultaneously run a second operating system. 



include a second controller coxmected both to the first 
controller and to the two computing elements. For 
purposes of providing limited disaster resilience, the 

25 first controller and one of the computing elements can be 
placed in a location remote from the second controller 
and the other computing element, and can be connected to 
the second controller and the other computing element by 
a communications link. 

30 For improved availsUsility and performance, the 

dual controller,' dual computing element system can be 
coxmected to an identical second system. The two systems 
then run a distributed computing environment in which one 
of the systems runs a first portion of a first 



The single controller system can be escpanded to 



wo 95/15529 



PCTA7S94/13350 



- 11 - 

application and the other system runs either a second 
application or a second portion of the first application. 

In another embodiment, the invention features a 
computer system that includes three controllers connected 
5 to each other and three computing elements that core each 
connected to different pairs of the three controllers. 
This system, like the other systems, also features 
intercepting I/O operations by the computing elements and 
redirecting them to the controllers for processing. For 

10 disaster resilience, the first controller and one of the 
computing elements are placed in a location remote from 
the remaining controllers and computing elements, or each 
controller/ computing element pair is placed in a 
different location. 

15 A disaster tolerant system is created by 

connecting at least two of the three controller systems 
described above. The three controller systems are placed 
in remote locations and connected by a communications 
link. 

20 Brief Description of the Drawings 

Fig. 1 is a block diagram of a partially fault 
resilient system. 

Fig. 2 is a block diagram of system software of 
the system of Fig. 1. 
25 Fig. 3 is a flowchart of a procediire used by an 

lOP Monitor of the system software of Fig. 2. 

Fig. 4 is a block diagram of an IPI module of the 
system of Fig. l. 

Fig. 5 is a state transition table for the system 
30 of Fig. 1. 

Fig. 6 is a block diagreun of a fault resilient 

system. 

Fig. 7 is a block diagram of a distributed fault 
resilient system. 
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Fig. 8 Is a block diagram of a fault tolerant 

system. 

Fig* 9 is flowchart of a fault diagnosis procedure 
used by lOPs of the system of Fig. 8. 
5 Fig. 10 is a block diagram of a disaster tolerant 

system. 

Description of the Preferred Embodiments 
As illustrated in Fig. 1, a fault resilient system 
10 includes an I/O processor ("lOP") 12 and two computing 

10 elements ("CEs") 14a, 14b (collectively referred to as 
C£s 14) . Because system 10 includes only a single lOP 12 
and therefore cannot recover from a failure in TOP 12, 
system 10 is not entirely fault resilient. 

lOP 12 includes two inter-processor interconnect 

15 ("IPI") modules 16a, 16b that are connected, 

respectively, to corresponding IPI modules 18a, 18b of 
CEs 14 by cables 20a, 20b. lOP 12 also includes a 
processor 22, a memory system 24, two hard disk drives 
26, 28, and a power supply 30. Simil2u:ly, each C£ 14 

20 includes a processor 32, a memory system 34, and a power 
supply 36. Separate power supplies 36 are used to ensure 
fault resilience in the event of a power supply failure. 
Processors 32a, 32b are "identical" to each other in 
that, for every instruction, the number of cycles 

25 required for processor 32a to perform an instruction is 
identical to the number of cycles required for processor 
32b to perform the seune instruction. In the illustrated 
embodiment, system 10 has been implemented using standard 
Intel 486 based motherboards for processors 22, 32 and 

30 four megabytes of memory for each of memory systems 24, 
34. 

lOP 12 and CEs 14 of system 10 mm unmodified 
operating system and applications software, with hard 
drive 26 being used as the boot disk for the lOP and hard 
35 drive 28 being used as the boot disk for CEs 14. In 
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truly fault resilient or fault tolerant systems that 
include at least two lOPs, each hard drive would also be 
duplicated. 

In the illustrated embodiment, the operating 
5 system for lOP 12 and CEs 14 is DOS. However, other 

operating systems can also be used. Moreover, lOP 12 can 
run a different operating system from the one run by CEs 
14. For example, lOP 12 could run Unix while CEs 14 run 
DOS. This approach is advantageous because it allows CEs 

10 14 to access peripherals from operating systems that do 
not support the peripherals. For example, if CEs 14 were 
running an operating system that did not support CD-ROM 
drives, and TOP 12 were running one that did, CEs 14 
could access the CD-ROM drive by issuing I/O requests 

15 identical to those used to, say, access a hard drive. 
lOP 12 would then handle the translation of the I/O 
request to one suitable for accessing the CD-ROM drive. 

Referring also to Fig. 2, system 10 includes 
specialized system software 40 that controls the booting 

20 and synchronization of CEs 14, disables local time in CEs 
14, redirects all I/O requests from CEs 14 to lOP 12 for 
execution, and returns the results of the I/O requests, 
if any, from lOP 12 to CEs 14. 

System software 40 includes two sets of IPI BIOS 

25 42 that are ROM-based and are each located in the IPI 

module 18 of a CE 14. IPI BIOS 42 are used in bootup and 
synchronization activities. When a CE 14 is booted, IPI 
BIOS 42 replaces the I/O interrupt addresses in the 
system BIOS interrupt table with addresses that are 

30 controlled by CE Drivers 44. The interrupt addresses 
that are replaced include those corresponding to video 
services, fixed disk services, serial communications 
services, keyboard services, and time of day services. 

IPI BIOS 42 also disables normal memory refreshing 

35 to ensure that memory refreshing, which affects the 
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nimber of cycles during which a CE 14 is actually 
processing, is controlled by system software 40. Memory 
refreshing is required to maintain memory integrity. In 
known refreshing methods, memory is refreshed 
5 periodically, with one block of memory being refreshed at 
the end of each refresh period. The duration of the 
refresh period is selected so that the entire memory is 
refreshed within the memory's refresh limit. Thus, for 
example, if a memory has 256 blocks and em 8 ms refresh 

10 limit, then the refresh period is 31.25 /xs (8 ms / 256). 

In the described embodiment, IPI BIOS 42 disables 
memory refreshing by placing a cotmter used in the Intel 
486 motherboard to control memory refreshing in a mode 
that requires a gate input to the counter to change in 

15 order to increment. Because the gate input is typically 
connected to the power supply, the gate input never 
changes and the counter is effectively disabled. 

Two CE Drivers 44 of system softweure 40 handle 
memory refreshing by biurst refreshing multiple blocks of 

20 memory each time that an I/O request or quantum interrupt 
is generated. CE Drivers 44 sure stored on CE boot disk 
28 and are run by CEs 14. In addition to performing 
burst memory refreshes, CE Drivers 44 intercept I/O 
requests to the system BIOS and redirects them through 

25 IPI modules 18 to lOP 12 for execution. CE Drivers 44 
also respond to interrupt requests from IPI modules 18, 
diseQ>le the system clock, and, based on information 
supplied by lOP Monitor 48, control the time of day of 
CEs 14. 

30 An lOP Driver 46 that is located on lOP boot disk 

26 and is run by lOP 12 handles I/O requests from CEs 14 
by redirecting them to an lOP Monitor 48 for processing 
and trsmsmitting the results from lOP Monitor 48 to CEs 
14. lOP Driver 46 communicates with CE drivers 44 using 

35 a packet protocol. 
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lOP Monitor 48 is located on lOP boot disk 26 anA 
is run by lOP 12. lOP Monitor 48 controls system 10 and 
performs the actual I/O requests to produce the results 
that are transmitted by lOP Driver 46 to CEs 14. 
5 System software 40 also includes console software 

49 that runs on lOP 12 and provides for user control of 
system 10. Using console softwaare 49, a user cm reset , 
boot, or synchronize a CE 14, The user can also set one 
or both of CEs 14 to automatically boot (autoboot) and/or 

10 automatically synchronize (autosync) after being reset or 
upon startup. The ability to control each CE 14 is 
useful both during normal operation and for test 
piirposes. Using console software 49, the user can also 
place system 10 into either an integrity mode in which 

15 TOP Monitor 48 shuts down both CEs 14 when faced with a 
miscompare error, a first availability mode in which lOP 
Monitor 48 disables CE 14a when faced with a miscompare 
error, or a second availability mode in which lOP Monitor 
48 disables CE 14b when faced with a miscompare error. 

20 Finally, console software 49 allows the user to request 
the status of system 10. In an alternative embodiment, 
console softweure 49 could be implemented using a separate 
processor that communicates with lOP 12. 

Each CE 14 runs a copy of the same application and 

25 the same operating system as that run by the other CE 14. 
Moreover, the contents of memory systems 34a and 34b are 
the same, and the operating context of CEs 14 are the 
same at each synchronization time. Thus, lOP Monitor 48 
should receive identical sequences of I/O recpiests from 

30 CEs 14. 

As shown in Fig. 3, lOP Monitor 48 processes and 
monitors I/O requests according to a procedure 100. 
Initially, lOP Monitor 48 waits for an I/O request from 
one of CEs 14 (step 102) • Upon receiving an I/O request 
35 packet from, for example, CE 14b, lOP Monitor 48 waits 
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for either an I/O request from CE 14a or for the 
expiration of a timeout period (step 104) . Because 
system 10 uses the DOS operating system, which halts 
execution of an application while an I/O request is being 
5 processed, lOP Monitor 48 is guaranteed not to receive an 
I/O recpiest from CE 14b while waiting (step 104) for the 
I/O request from the CE 14a. 

Next, lOP Monitor 48 checks to determine whether 
the timeout period has expired (step 106) ♦ If not (that 

10 is, an I/O request packet from CE 14a has arrived) , lOP 
Monitor 48 compcures the checksums of the packets (step 
108), and, if the checksums are equal, processes the I/O 
request (step 110) • After processing the I/O request, 
lOP Monitor 48 issues a request to the system BIOS of lOP 

15 12 for the current time of day (step 112). 

After receiving the time of day, lOP Monitor 48 
assembles an IPI packet that includes the time of day cmd 
the results, if any, of the I/O request (step 114) and 
sends the IPI packet to lOP Driver 46 (step 116) for 

20 transmission to CEs 14. When CEs 14 receive the IPI 
packet, they use the transmitted time of day to update 
their local clocks which, as already noted, are otherwise 
disabled. 

As required by DOS, execution in CEs 14 is 
25 suspended until lOP Monitor 48 returns the results of the 
I/O request through lOP Driver 46. Because, before 
execution is resumed, the times of day of both CEs 14 are 
updated to a common value (the transmitted time of day 
from the IPI packet) , the CEs 14 are kept in time 
30 synchronization with the transmitted time of day being 
designated the meta time. If a multitasking operating 
system were employed, execution in CEs 14 would not be 
suspended while lOP Monitor 48 performed the I/O request. 
Instead, processing in CEs 14 would be suspended only 
35 iintil receipt of an acknowledgement indicating that lOP 
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Monitor 48 has begun processing the I/O request (step 
110) • The acknowledgement would include the tine of day 
and would be used by CEs 14 to update the local clocks. 

After sending the IPX packet to lOP Driver 46, lOP 
5 Monitor 48 verifies that both of CEs 14 are online (step 
118) , and, if so, waits for another I/O request from one 
of CEs 14 (step 102) . 

If the timeout period has expired (step 106) , lOP 
Monitor 48 disables the C£ 14 that failed to respond 

10 (step 119) and processes the I/O request (step 110) • 

If there is a miscompare between the checksums of 
the packets from CEs 14 (step 108) , lOP Monitor 48 checks 
to see if system 10 is operating in an availability mode 
or an integrity mode (step 120). 

15 If system 10 is operating in an availability mode, 

lOP Monitor 48 disables the appropriate CE 14 based on 
the selected availability mode (step 122) , and processes 
the I/O request (step 110) . Thereafter, when lOP Monitor 
48 checks whether both CEs 14 are online (step 118), and 

20 assuming that the disabled CE 14 has not been repaired 
and reactivated, lOP Monitor 48 then waits for em I/O 
request from the online CE 14 (step 124) . Because system 
10 is no longer fault resilient, when an I/O request is 
received, lOP Monitor 48 immediately processes the I/O 

25 request (step 110) • 

If system 10 is operating in an integrity mode 
when a miscompare is detected, lOP Monitor 48 disables 
both CEs 14 (step 12iS) and stops processing (step 128) • 
Referring again to Figs, i and 2, when the 

30 application or the operating system of, for exeunple, CE 
14a makes a non-I/0 call to the system BIOS, the system 
BIOS executes the request and returns the results to the 
application without invoking system software 40. 
However, if the application or the operating system makes 

35 an I/O BIOS call, CE Driver 44a intercepts the I/O 
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request. After intercepting the I/O request, CE Driver 
44a packages the I/O request into an IPI packet and 
tremsmits the IPI packet to lOP 12. 

When IPI module 16a of lOP 12 detects transmission 
5 of an IPI packet from CE 14a, IPI module 16a generates an 
interrupt to lOP Driver 16. lOP Driver 46 then reads the 
IPI packet. 

As discussed above, lOP Monitor 48 responds to the 
IPI packet from CE 14a according to procedure 100. As 

10 also discussed, assuming that there are no hardware 

faults, lOP Driver 46 eventually transmits an IPI packet 
that contains the results of the I/O request and the time 
of day to CEs 14. 

IPI modules 18 of CEs 14 receive the IPI packet 

15 from lOP 12. CE Drivers 44 unpack the IPI packet, update 
the time of day of CEs 14, and return control of CEs 14 
to the application or the operating system running on CEs 
14. 

If no I/O requests are issued within a given time 

20 intexnral, the IPI module 18 of a CE 14 generates a so- 
called quantum interrupt that invokes the CE Driver 44 of 
the CE 14. In response, the CE Driver 44 creates a 
quantum interrupt IPI packet and transmits it to lOP 12. 
lOP Monitor 48 treats the quantum interrupt IPI packet as 

25 an IPI packet without an I/O request. Thus, lOP Monitor 
48 detects the incoming quantum interrupt IPI packet 
(step 102 of Fig. 3) and, if a matching quantum interrupt 
IPI packet is received from the other CE 14 (steps 104, 
106, and 108 of Fig. 3), issues a request to the system 

30 BIOS of lOP 12 for the current time of day (step 112 of 
Fig. 3). lOP Monitor 48 then packages the current time 
of day into a quantum response IPI packet (step 114 of 
Fig. 3) that lOP Driver 46 then sends to CEs 14 (step 116 
of Fig. 3} . CE Drivers 44 respond to the quantum 

35 response IPI packet by updating the time of day and 



wo 95/15529 PCT/US94/13350 



- 19 - 

returning control of CEs 14 to the application or the 
operating system running on CEs 14. 

If lOP Monitor 48 does not receive a quantum 
interrupt IPI package from the other CE 14 within a 
5 predefined timeout period (step 106 of Fig. 3), lOP 
Monitor 48 responds by disabling the non-responding CE 
14. 

As shown in Fig. 1, IPI modules 16, 18 and cables 
20 provide all of the hardware necessary to produce a 

10 fault resilient system from the standard Intel 486 based 
motherboards used to implement processors 22, 32. An IPI 
module 16 and an IPI module 18, which are implemented 
using identical boards, each perform similar functions. 
As illustrated in Fig. 4, an IPI module 18 

15 includes a control logic 50 that communicates I/O 
requests and responses between the system bus of a 
processor 32 of a CE 14 and a parallel interface 52 of 
IPI module 18. Parallel interface 52, in turn, 
communicates with the parallel interface of an IPI module 

20 16 through a cable 20. Parallel interface 52 includes a 
sixteen bit data output port 54, a sixteen bit data input 
port 56, and a control port 58. Cable 20 is configured 
so that data output port 54 is connected to the data 
input port of the IPI module 16, data input port 56 is 

25 connected to the data output port of the IPI module 16, 
emd control port 58 is coimected to the control port of 
the IPI module 16. Control port 58 implements a 
handsheJcing protocol between IPI module 18 £md the IPI 
module 16. 

30 Control logic 50 is also connected to an IPI BIOS 

ROM 60. At startup, control logic 50 transfers IPI BIOS 
42 (Fig. 2), the contents of IPI BIOS ROM 60, to 
processor 32 through the system bus of processor 32. 

A QI counter 62, also located on IPI module 18, 

35 generates quantum interrupts as discussed above. QI 
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counter 62 includes a clock input 64 that is connected to 
the system clock of processor 32 and a gate input 66 that 
is connected to control logic 50. Gate input 66 is used 
to activate and reset the counter value of QI counter 62. 
5 When activated, QI counter 62 decrements the counter 
value by one during each cycle of the system clock of 
processor 32. When the counter value reaches zero, QI 
counter 62 generates a quantum interrupt that, as 
discussed above, activates CE Driver 44 (Fig. 2) • 

10 CE Driver 44 deactivates QI counter 62 at the 

beginning of each I/O transaction. CE Driver 44 
deactivates QI counter 62 by requesting an I/O write at a 
first address, known as the QI deactivation address. 
Control logic 50 detects the I/O write request and 

15 deactivates QI counter 62 through gate input 66. Because 
this particuleo: I/O write is for control purposes only, 
control logic 50 does not pass the I/O write to parallel 
interface 52 . At the conclusion of each I/O transaction, 
CE Driver 44 resets and activates QI counter 62 by 

20 requesting an I/O write to a second address, known as the 
QI activation address. Control logic 50 responds by 
resetting and activating QI counter 62. 

In an alternative approach, quantum interrupts are 
generated through use of debugging or other features 

25 available in processor 32. Some commonly available 
processors include debugging or trap instructions that 
trap errors by transferring control of the processor to a 
designated program after the coxopletion of a selected 
number of instructions following the trap instruction. 

30 In this approach, each time that CE Driver 44 returns 
control of processor 32 to the application or operating 
system, CE Driver 44 issues a trap instruction to 
indicate that control of processor 32 should be given to 
CE Driver 44 upon completion of, for example, 300 

35 instructions. After processor 32 completes the indicated 
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300 instructions, the trap instruction causes control of 
processor 32 to be returned to CE Driver 44. In the 
event that an I/O request activates CE Driver 44 prior to 
completion of the indicated number of instructions, CE 
5 Driver 44 issues an instruction that cancels the trap 
instruction. 

IPI Module 18 is also used in activating an 
offline CE 14. As discussed below, before an offline CE 
14 is activated, the contents of the memory system 34 of 

10 the active CE 14 are copied into the memory system 34 of 
the offline CE 14. To minimize the effects of this 
copying on the active CE 14, the processor 32 of the 
active CE 14 is permitted to continue processing and the 
memory is copied only during cycles in which the system 

15 bus of the processor 32 of the active CE 14 is not in 
use. 

To enable processor 32 to continue processing 
while the memory is being copied, IPI module 18 accounts 
for memory writes by the processor 32 to addresses that 

20 have already been copied to the offline CE 14. To do so, 
control logic 50 monitors the system bus and, when the 
processor 32 writes to a memory address that has already 
been copied, stores the address in a FIFO 68. When the 
memory transfer is complete, or when FIFO 68 is full, the 

25 contents of memory locations associated with the memory 
addresses stored in FIFO 68 are copied to the offline CE 
14 and FIFO 68 is emptied. In other approaches, FIFO 68 
is modified to store both memory addresses and the 
contents of memory locations associated with the 

30 addresses, or to store the block addresses of memory 
blocks to which memory addresses being written belong. 

IPI module 18 also handles non-BIOS I/O requests. 
In some computer systems, the BIOS is too slow to 
effectively perform I/O operations such as video display. 

35 As a result, some less structured or less disciplined 
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operating systems, such as DOS or UNIX, allow 
applications to circumvent the BIOS and make non-BIOS I/O 
requests by directly reading from or writing to the 
addresses associated with I/O devices. These non-BIOS 
5 I/O requests, which cannot be intercepted by chcmging the 
system interrupt table, as is done in connection with, 
for example, I/O disk reads and writes, are problematic 
for a system in which syncluronization requires tight 
control of the I/O interface. 

10 To remedy this problem, and to assure that even 

non-BIOS I/O requests can be isolated and managed by lOP 
12, IPI module 18 includes virtual I/O devices that mimic 
the hardware interfaces of physical I/O devices. These 
virtual I/O devices include a virtual display 70 and a 

15 virtual keyboard 72. As needed, other virtual I/O 
devices such as a virtual mouse or virtual serial and 
parallel ports could also be used. 

In practice, control logic 50 monitors the system 
bus for read or write operations directed to addresses 

20 associated with non-BIOS I/O requests to system I/O 
devices. When control logic 50 detects such an 
operation, control logic 50 stores the information 
necessary to reconstruct the operation in the appropriate 
virtual device. Thus, for exaunple, when control logic 50 

25 detects a write operation directed to an address 

associated with the display, control logic 50 stores the 
information necessary to reconstruct the operation in 
virtual display 70. Each time that a BIOS I/O request or 
a quantum interrupt occurs, CE Driver 44 scans the 

30 virtual I/O devices and, if the virtual devices are not 
empty, assembles the information stored in the virtual 
devices into an IPI packet and transmits the IPI packet 
to lOP 12. lOP 12 treats the packet like a BIOS I/O 
request using procedure 100 discussed above. When 

35 control logic 50 detects a read addressed to a virtual 
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I/O device, control logic 50 assembles the read reGpiest 
into an IPI packet for handling by lOP 12* lOP 12 treats 
the IPI packet like a standard BIOS I/O request. 

Referring to Fig. 5, each CE 14 always operates in 
5 one of eight states and, because there are only a limited 
number of permissible state combinations, system 10 
always operates in one of fourteen states. The major CE 
operating states are OFFLINE, RTB (ready to boot), 
BOOTING, ACTIVE, RTS (ready to sync) , WAITING, M_SYNC, 

10 (synchronizing as master) , and S^SYNC (synchronizing as 
slave) . lOP Monitor 48 changes the operating states of 
CEs 14 based on the state of system 10 and user commands 
from console software 49. Through console software 49, a 
user can reset a CE 14 at any time. Whenever the user 

15 resets a CE 14, or a fault occurs in the CE 14, lOP 
Monitor 48 changes the state of the CE 14 to OFFLINE. 

At startup, system 10 is operating with both CEs 
14 OFFLINE (state 150) . System 10 operates in the upper 
states of Fig. 5 (states 152-162) when CE 14a becomes 

20 operational before CE 14b and in the lower states (states 
166-176) when CE 14b is the first to become operational* 
If CEs 14 become operational simultaneously, the first 
operational CE 14 to be recognized by lOP Monitor 48 is 
treated as the first to become operational. 

25 When a CE 14 indicates that it is ready to boot by 

issuing a boot request, the state of the CE 14 changes to 
RTB if the CE 14 is not set to autoboot or to BOOTING if 
the CE 14 is set to autoboot. For example, if CE 14a 
issues a boot request when both CEs 14 are OFFLINE, and 

30 CE 14a is not set to autoboot, then the state of CE 14a 
changes to RTB (state 152) . Thereafter, lOP Monitor 48 
waits for the user, through console software 49, to boot 
CE 14a. When the user boots CE 14a, the state of CE 14a 
changes to BOOTING (state 154). If the user resets CE 

35 14a, the state of CE 14a changes to OFFLINE (state 150). 
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If both CEs 14 are OFFLINE when CE 14a issues a 
boot request, and CE 14a is set to autoboot, the state of 
CE 14a changes to BOOTING (state 154) . If CE 14a boots 
successfully, the state of CE 14a changes to ACTIVE 
5 (state 156) . 

When CE 14a is ACTIVE, and CE 14b issues a boot 
recpiest, or if CE 14b had issued a boot request while the 
state of CE 14a was transitioning from OFFLINE to ACTIVE 
(states 152-156) , the state of CE 14b changes to RTS 

10 (state 158) if CE 14b is set to autosync and otherwise to 
WAITING (state 160) . If the state of CE 14b changes to 
RTS (state 158) , lOP Monitor waits for the user to issue 
a synchronize command to CE 14b. When the user issues 
such a command, the state of CE 14b changes to WAITING 

15 (state 160) . 

Once CE 14b is WAITING, lOP Monitor 48 copies the 
contents of memory system 34a of CE 14a into memory 
system 34b of CE 14b. Once the memory transfer is 
complete, lOP Monitor 48 waits for CE 14a to transmit a 

20 qusmtum interrupt or I/O request IPI packet. Upon 
receipt of such a packet, lOP Monitor 48 changes the 
state of CE 14a to M_SYNC and the state of CE 14b to 
S_SYNC (state 162), and synchronizes the CEs 14. This 
synchronization includes responding to any memory changes 

25 that occurred while lOP Monitor 48 was waiting for CE 14a 
to transmit a quantum interrupt or I/O request IPI 
packet. Upon completion of the synchronization, the 
states of the CEs 14 both change to ACTIVE (state 164) 
and system 10 is deemed to be fully operational. 

30 * In an alternative implementation, lOP Monitor 48 

does not wait for memory transfer to complete before 
changing the state of CE 14a to M_SYNC and the state of 
CE 14b to S_SyNC (state 162). Instead, lOP Monitor 48 
makes this state change upon receipt of an IPI packet 
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from CE 14 a and performs the memory transfer as part of 
the synchronization process. 

Similar state transitions occur when CE 14b is the 
first CE 14 to issue a boot request. Thus, assuming that 
5 CE 14b is not set to autoboot, CE 14b transitions from 
OFFLINE (state 150) to RTC (state 166) to BOOTING (state 
168) to ACTIVE (state 170) . Similarly, once CE 14b is 
ACTIVE, and assuming that CE 14a is not set to autosync, 
CE 14a transitions from OFFLINE (state 170) to RTS (state 

10 172) to WAITING (state 174) to S_SYNC (state 176) to 
ACTIVE (state 164) . 

In other embodiments of the invention, for 
excuaple, referring to Fig. 6, a fault resilient system 
200 includes two lOPs 202 and two CEs 204. Each CE 204 

15 is connected, through an IPI card 206 and a cable 208, to 
an IPI card 210 of each lOP 202. lOPs 202 are 
redundantly connected to each other through IPI cards 210 
and cables 212. Because every component of system 200 
has a redundant backup component, system 200 is entirely 

20 fault resilient. In an alternative approach, cables 208 
and 210 could be replaced by a pair of local area 
networks to which each lOP 202 and CE 204 would be 
connected. Indeed, local area networks can always be 
substituted for cable connections. 

25 System 200 is operating system and application 

software independent in that it does not require 
modifications of the operating system or the application 
software to operate. Any single piece of hardware can be 
upgraded or repaired in system 200 with no service 

30 interruption. Therefore, by sequentially replacing each 
piece of hardware and allowing system 200 to 
resynchronize after each replacement, the hardware of 
system 200 can be replaced in its entirety without 
service interruption. Similarly, software on system 200 

35 can be upgraded with minimal service interruption (that 
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is, during the software upgrade, the application will 
become unavailable for an acceptable period of time such 
as two seconds). Also, disaster tolerance for purposes 
of availability can be obtained by placing each lOP/CE 
5 pair in a separate location and connecting the pairs 
through a communications link. 

Referring to Fig. 7, a distributed, high 
performance, fault resilient system 220 includes two 
systems 200, the lOPs 202 of which eure connected to each 

10 other, through IPI modules, by cables 222. System 220 
uses distributed computing environment software to 
achieve high performance by running separate poirtions of 
an application on each system 200. System 220 is fault 
tolerant and offers the ability to perform both hardware 

15 and software upgrades without service interruption. 

Referring to Fig. 8, a fault tolerant system 230 
includes three lOPs (232, 234, and 236) and three CEs 
(238, 240, and 242). Through IPI modules 244 and cables 
246, each IGF is connected to an IPI module 244 of each 

20 of the other lOPs. Through IPI modules 248 and cables 
250, each CE is connected to an IPI module 244 of two of 
the lOPs, with C£ 238 being connected to lOPs 232 and 
234, CE 240 being connected to lOPs 232 and 236, and CE 
242 being connected to lOPs 234 and 236. Like system 

25 200, system 230 allows for hardware upgrades without 
service intenruption and software upgrades with only 
minimal service interruption. 

As can be seen from a comparison of Figs. 7 and 8, 
the CEs and lOPs of systems 200 and 230 are identically 

30 configured. As a result, upgrading a fault resilient 
system 200 to a fault tolerant system 230 does not 
require any replacement of existing hardware and entails 
the simple procedure of adding an additional CE/IOP pair, 
connecting the cables, and making appropriate changes to 

35 the system software. This modularity is an important 
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feature of the paired modular redxindant architecture of 
the invention. 

Because the components of system 230 are triply 
redundant, system 230 is more capable of identifying the 
5 source of a hardware fault than is system 10. Thus, 
while system 10 simply disables one or both of CEs 14 
when an error is detected, system 230 offers a higher 
degree of fault diagnosis. 

Referring to Fig* 9, each lOP (232, 234, 236) of 

10 system 230 performs fault diagnosis according to a 
procedure 300. Initially, each lOP (232, 234, 236) 
checks for major faults such as power loss, broken 
cables, and nonfunctional CEs or lOPs using well known 
techniques such as power sensing, cable sensing, and 

15 protocol timeouts (step 302). When such a fault is 
detected, each lOP disables the faulty device or, if 
necessary, the entire system. 

After checking for major faults, each lOP waits to 
receive IPI packets (that is, quantum interxupts or I/O 

20 requests) from the two CEs to which the lOP is connected 
(step 304). Thus, for example, lOP 232 waits to receive 
IPI packets from CEs 238 and 240. After receiving IPI 
packets from both connected CEs, each lOP transmits the 
checkstuns ("CRCs") of those IPI packets to the other two 

25 lOPs and waits for receipt of CRCs from the other two 
lOPs (step 306) . 

After receiving the CRCs from the other two lOPs, 
each lOP generates a three by three matrix in which each 
column corresponds to a CE, each row corresponds to an 

30 lOP, and each entry is the CRC received from the column's 
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CE by the rov^s TOP (step 308) . Thus, for example, lOP 

232 generates the following matrix: 

CE 238 CE 240 CE 242 

lOP 232 j CRC CRC X 

5 lOP 234 } CRC X CRC 

lOP 236 j X CRC CRC 

After generating the matrix, lOP 232 sums the entries in 
each row and each column of the matrix. If the three row 
sums are equal and the three column sums are ecpial (step 

10 310) , then there is no fault and lOP 232 checks again for 
major faults (step 302). 

If either the three rows' sums or the three 
columns' sums are unequal (step 310), then lOP 232 
compares the CRC entries in each of the columns of the 

15 matrix. If the two CRC entries in each coliimn match 

(step 312), then lOP 232 diagnoses that a CE failure has 
occurred and disables the CE corresponding to the column 
for which the sum does not equal the sims of the other 
columns (step 314) . 

20 If the CRC entries in one or more of the matrix 

columns do not match (step 312} , then lOP 232 determines 
how msmy of the coltunns include mismatched entries. If 
the matrix includes only one column with mismatched 
entries (step 315) , then lOP 232 diagnoses that the path 

25 between the lOP corresponding to the matrix row sum that 
is unequal to the other matrix row stims and the CE 
corresponding to the column having mismatched entries has 
failed and disables that path (step 316) . For purposes 
of the diagnosis, the path includes the IPI module 244 in 

30 the lOP, the IPI module 248 in the CE, and the cable 250. 

If the matrix includes more than one column with 
mismatched entries (step 314), then lOP 232 confirms that 
one matrix row sum is unequal to the other matrix row 
sums, diagnoses an lOP failure, and disables the lOP 

35 corresponding to the matrix row sum that is unequal to 
the other matrix row stuns (step 318). 
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If, after diagnosing and accounting for a CE 
failure (step 314) , path failure (step 316) , or lOP 
failure (step 318) , TOP 232 determines that system 300 
still includes sufficient non-faulty hardware to remain 
5 operational, lOP 232 checks again for major faults (step 
302). Because system 230 is triply redundant, system 230 
can continue to operate even after several components 
have failed. For example, to remain operating in an 
availability mode, system 230 only needs to have a single 

10 fxinctional CE, a single functional lOP, and a functional 
path between the two. 

Using procedure 300, each lOP (232, 234, 236) can 
correctly diagnose any single failure in a fully 
operational system 230 or in a system 230 in which one 

15 element (that is, a CE, an lOP, or a path) has previously 
been disabled. In a system 230 in which an element has 
been disabled, each lOP accounts for CRCs that are not 
received because of the disabled element by using values 
that appear to be correct in comparison to actually 

20 received CRCs. 

Procedure 300 is not dependent on the paurticular 
arrangement of interconnections between the CEs and lOPs. 
To operate properly, procedure 300 only requires that the 
output of each CE be directly monitored by at least two 

25 lOPs, Thus, procedure 300 could be implemented in a 
system using any interconnect mechanism and does not 
reqniire point to point connections between the CEs and 
lOPs. For example, the CEs and lOPs could be connected 
to at least two local area networks. In an alternative 

30 approach, instead of summing the CRC values in the rows 
and columns of the matrix, these values can be compared 
and those rows or colimns in which the entries do not 
match can be marked with a match/mismatch indicator. 

A simplified version of procedure 300 can be 

35 implemented for use in a system 200. In this procedure. 
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each TOP 202 of system 200 generates a two by two matrix 
in vhicdi each column corresponds to a CE 204 and each row 
corresponds to a lOP 202: 

CE 204 CE 204 

5 lOP 202 i ORG CRC 

lOP 202 I CRC CRC 

After generating the matrix, each lOP 202 attaches a 
mismatch indicator to each row or column in which the two 
entries are mismatched* 

10 If there are no mismatch indicators, then system 

200 is operating correctly. 

If neither row and both columns have mismatch 
indicators, then an lOP 202 has faulted. Depending on 
the operating mode of system 200, an lOP 202 either 

15 disables another ID? 202 or shuts down system 200. The 
lOP 202 to be disabled is selected based on user supplied 
parameters similcur to the two availability modes used in 
system 10 • 

If both rows and neither column have mismatch 

20 indicators, then a CE 204 has faulted. In this case, 
lOPs 202 respond by disabling a CE 204 if system 200 is 
operating in an availability mode or, if system 200 is 
operating in an integrity mode, shutting down system 200, 
If both rows and one column have mismatch indicators, 

25 then one of the paths between the lOPs 202 and the CE 204 
corresponding to the mismatched column has failed. 
Depending on the operating mode of system 200, lOPs 202 
either disable the CE 204 having the failed path or shut 
down system 200. If both rows and both column have 

30 mismatch indicators, then multiple faults exist and lOPs 
202 shut down system 200, 

If one row and both columns have mismatch 
indicators, then the lOP 202 corresponding to the 
mismatched row has faulted. Depending on the operating 

35 mode of system 200, the other lOP 202 either disables the 
faulty lOP 202 or shuts down system 200. If one row and 
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one colimn have mismatch indicators, then the path 
between the lOP 202 corresponding to the mismatched row 
and the C£ 204 corresponding to the mismatched column has 
failed. Depending on the operating mode of system 200, 
5 lOPs 202 either account for the failed path in future 
processing or shut down system 200. 

Referring to Fig. 10, one embodiment of a disaster 
tolerant system 260 includes two fault tolerant systems 
230 located in remote locations and connected by 

10 communications link 262, such as Ethernet or fiber, and 
operating in meta time lockstep with each other. To 
obtain meta time lockstep, all IPX packets are 
transmitted between fault tolerant systems 230. Like 
system 220, system 260 allows for hardware and software 

15 upgrades without service interruption. 

As shown, the paired modular redundant 
architectxire of the invention allows for yaJ^ing levels 
of fault resilience and fault tolerance through use of 
CEs that operate asynchronously in real time and are 

20 controlled by lOPs to operate synchronously in meta time. 
This architecture is simple and cost-effective, and can 
be expanded or upgraded with minimal difficulty. 
What is claimed is: 
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!• A method of synchronizing at least two 
confuting elements in a computer system including the at 
least two computing elements and at least one controller, 
wherein each of the computing elements have clocks that 
5 operate asynchronously of the clocks of the other 

computing elements, said method comprising the steps of: 

selecting one or more signals from a set of 
signals produced by the computing elements; 

monitoring the computing elements to detect the 
10 production of a selected signal by one of the computing 
elements ; 

waiting for the production of a selected signal by 
each other computing element after detection of a 
selected signal by one of the computing elements; 
15 transmitting equal time updates from the at least 

one controller to each of the computing elements after 
receipt of selected signals from all of the computing 
elements ; and 

updating the clocks of the computing elements 
20 based on the time updates. 



2. The method of claim 1, further comprising the 
step of forming a fault resilient computer from the at 
least two computing elements and the at least one 
controller. 

25 3. The method of claim 1, wherein said selecting 

step comprises the step of selecting I/O requests as the 
selected signals. 

4. The method of claim 3, further comprising the 
steps of: 

30 processing the I/O requests at the at least one 

controller to produce I/O responses; and 
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transmitting the time updates with the I/O 
responses from the at least one controller to the at 
least two computing elements. 

5. The method of claim 1, wherein said selecting 
5 step comprises the step of selecting guantim interrupts 
and I/O requests as the selected signals. 



6. The method of claim 1, wherein said selecting 
step comprises the step of selecting quantum interrupts 
as the selected signals. 

10 7. The method of claim 6, further comprising the 

step of generating quantum interrupts in each computing 
element by counting clock cycles in the computing 
elements. 



8. The method of claim 7, wherein the step of 
15 counting clock cycles includes counting the cycles of a 
selected one of a system clock, an I/O clock, and a bus 
clock. 



9. The method of claim 7, further comprising the 
steps of: 

20 loading a counter in each of the computing 

elements with a predetermined value; 

enabling the counter in each of the computing 
elements with an I/O request; 

decrementing the value of the counter during a 
25 clock cycle in each of the computing elements; and 

signalling a quantum interrupt from a computing 
element when the value of the coxinter of the computing 
element reaches zero. 
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10. The method of claim 6, further cos^rising the 
step of generating quantxm interrupts by counting 
executed instructions in each computing element. 

11. The method of claim 6, further comprising the 
5 step of using debugging features of each computing 

element to generate guantina interrupts. 

12. The method of claim 1, further comprising the 
step of maintaining, for each computing element, a list 
of the selected signals produced by the computing 

10 element, wherein the equal time updates are transmitted 
when the lists for each computing element include a 
common entry. 

13. The method of claim 1, further comprising the 
steps of: 

15 comparing the selected signals generated by the 

computing elements and data, if any, accompanying the 
selected signals, and 

signalling that a fault has occurred if the 
selected signals or the accompanying data do not match. 

20 14. The method of claim 1, further comprising the 

steps of: 

stopping operation of each computing element after 
that computing element produces a selected signal, and 
resuming operation of a computing element upon 
25 receipt by the computing element of a time update. 

15. The method of claim 1, further comprising the 
step of continuing operation of a computing element after 
producing the selected signal. 
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16. The method of claim 1, further comprising the 
steps of: 

disabling asynchronous activities of the computing 
elements; and 

5 performing the functions of the asynchronous 

activities at a computing element when the computing 
element produces a selected signal. 

17. The method of claim 16, wherein said 
disabling step comprises the step of disabling normal 

10 memory refresh functions, and said performing step 

comprises the step of performing burst memory refreshes 
when said selected signal is produced. 

18. The method of claim 17, wherein said 
disabling step further comprises the steps of: 

15 placing a counter used in the normal memory 

refresh functions in a mode that recjuires an input value 
to a gate to change, and 

connecting the gate to a fixed voltage. 

19. A method of producing a fault resilient or 
20 fault tolerant computer, comprising the steps of: 

designating a first processor as a computing 
element; 

designating a second processor as a controller; 
connecting the computing element and the 
25 controller to produce a modular pair; 

connecting at least two modular pairs to produce a 
fault resilient or fault tolerant computer, 

wherein each computing element performs all 
instructions in the same number of cycles as the other 
30 computing elements. 
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20. ThB method of claim 19, wherein the first and 
second processors are industry standard processors. 

21. The method of claim 19, fxirther including the 
step of running industry standard operating systems and 

5 applications on the at least two controllers and the at 
least two computing elements. 

22. The method of claim 19, further including the 
steps of : 

running a first operating system on the at least 
10 two controllers; and 

rtinning a second operating system on the at least 
two computing elements. 

23. The method of claim 19, further comprising 
the step of locating a modular pair remotely from the one 

15 or more other modular pairs to provide disaster 
toleremce. 

24. The method of claim 19, further comprising the 
steps of: 

connecting a first I/O device to a first modular 

20 pair; 

connecting a second I/O device to a second moduleu: 
pair, said second I/O device being redundant of the first 
I/O device; and 

tramsmitting at least identical I/O write requests 
25 and data to the first and second I/O devices. 

25. The method of claim 24, further comprising the 
steps of: 

connecting a third I/O device to a third modulzur 
pair, said third I/O device being redundant of the first 
30 and second I/O devices; and 
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transmitting at least identical I/O write requests 
and data to the first, second, and third I/O devices. 

26. The method of claim 19, further comprising 
the step of activating an inactive processor by 

5 transferring the operational state of an active processor 
to the inactive processor through a controller. 

27. The method of claim 26, further comprising 
the step of pausing processing by said computing elements 
during said transferring step. 

10 28. The method of claim 26, further comprising 

the step of performing said transferring step as a 
background process without pausing processing by said 
computing elements. 

29. The method of claim 19, further comprising 
15 the step of upgrading a processor while said computing 

elements are processing by: 

disabling a processor to be upgraded; 
upgrading the disabled processor; and 
reactivating the upgraded processor by 
20 transferring the operational state of an active processor 

to the upgraded processor through a controller. 

30. The method of claim 19, further comprising 
the step of repairing a processor while said computing 
elements are processing by: 

25 disabling a processor to be repaired; 

repairing the disabled processor; and 
reactivating the repaired processor by 
transferring the operational state of an active processor 
to the repaired processor through a controller. 



wo 95/15529 



PCTAJS94/133S0 



-se- 
al. A method of isolating I/O requests from 
computing operations in a computer, comprising the steps 
of: 

providing a virtual I/O unit that simulates the 
5 interface of a physical I/O devices- 
intercepting an I/O request by a computing element 

that is addressed to the physical I/O device; 

providing the intercepted 1/0 request to the 

virtual I/O unit; 
10 transmitting the contents of the virtual I/O unit 

to an I/O processor; and 

at the I/O processor, providing the transmitted 

contents of the virtual I/O device to the physical I/O 

device. 

15 32. The method of claim 31, wherein said 

providing step includes providing a virtual keyboard. 

33. The method of claim 31, wherein said 
providing step includes providing a virtual display. 

34. The method of claim 31, further comprising 
20 the step of using the virtual I/O device to expose 

software errors caused by a software asynchrony that 
results in I/O race conditions. 

35. The method of claim 31, further comprising 
the steps of: 

25 intercepting a low level 1/0 request by a 

computing element; 

redirecting the intercepted low level I/O request 
to the I/O processor; 

at the I/O processor, performing the requested I/O 
30 operation to produce I/O results; and 
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retximing the I/O results to the computing 



element. 



36. A method of detecting and diagnosing faults 



In a computer system that includes at least two computing 
5 elements and at least two controllers, wherein each of 
the computing elements is connected to at least two of 
the controllers, and each controller is connected to at 
least two computing elements and to the other 
controllers, said method comprising the steps of: 
10 producing data at each of the computing elements; 

generating a value at each of the computing 
elements that relates to the produced data; 

transmitting the data, along with the 
corresponding values, from each computing element to the 
15 at least two connected controllers; 

transmitting the values received by each 
controller to the other controllers; and 

performing computations on the values 
corresponding to each confuting element and the values 
20 corresponding to each controller; 

wherein, when the results of the computations 
performed on the values corresponding to each controller 
are equal, and the results of the computations performed 
on the values corresponding to each computing element are 
25 equal, no faults exist. 



when the results of the computations performed on the 
values corresponding to each computing element and the 
results of the computations performed on the values 
30 corresponding to each controller are not equal, the steps 
of: 



37. The method of claim 36, further comprising. 
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conpeuring, for each one of the computing elements, 
all of the values corresponding to the one computing 
element, and 

designating one of the computing elements as 
5 faulty when the values corresponding to each computing 
element match for each computing element, but mismatch 
for different computing elements. 

38. The method of claim 36, further comprising, 
when the results of the computations performed on the 
10 values corresponding to each computing element and the 
results of the computations performed on the values 
corresponding to each controller are not equal, the steps 
of: 

comparing, for each one of the computing elements, 
15 all of the values corresponding to the one computing 
element, and 

designating a connection to one of the computing 
elements as faulty when the values corresponding only to 
the one computing element mismatch. 

20 39. The method of claim 36, further comprising, 

when the results of the computations performed on the 
values corresponding to each computing element and the 
results of the computations performed on the values 
corresponding to each controller are not equal, the steps 

25 of; 

cougar ing, for each one of the computing elements, 
all of the values corresponding to the one conqputing 
element, and 

when the values corresponding to two or more of 
30 the computing elements mismatch, designating the 
controller connected to the two or more computing 
elements as faulty. 
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40. A computer system, comprising: 
a controller, 

a first computing element connected to the 



controller. 



5 



a second computing element connected to the 



controller, 

means for intercepting I/O operations by the first 
and second computing elements, and 



10 operations to the controller, 

wherein the first computing element performs each 
instruction of its instruction set in the same number of 
cycles as the second computing element takes to perform 
said instruction. 

15 41. The computer system of claim 40, wherein the 

controller and the first and second computing elements 
each include an industry standard motherboard. 

42. The computer system of claim 40, further 
comprising a second controller connected to the first 
20 controller and to the first and second computing 
elements • 



first controller and the first computing element are 
located in a first location and the second controller and 
25 the second computing element are located in a second 
location, and further comprising a commimications link 
connecting the first controller to the second controller, 
the first controller to the second computing element, and 
the second controller to the first computing element. 



means for transmitting the intercepted I/O 



43. The computer system of claim 42, wherein the 



30 44. The computer system of claim 42, further 

comprising: 
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a third controller; 

a fourth controller connected to the third 
controller; 

a third computing element connected to the third 
5 controller and the fourth controller; 

a fourth computing element connected to the third 
controller and the fourth controller; 

means for connecting the third and fourth 
controllers to the first and second controller; and 
10 means for distributing computing tasks between the 

computing elements, wherein the first and second 
computing elements perform a first set of computing tasks 
and the third and fourth computing elements perform a 
second set of computing tasks 
15 wherein the third and fourth computing elements 

perform each instruction of their instruction sets in the 
same number of cycles as the first and second computing 
elements take to perform said instruction. 

45. The computer system of claim 42, wherein the 
20 first controller cuid the first computing element are 

remotely located from the second controller and the 
second computing element to provide disaster tolerance. 

46. The computer system of claim 40, wherein each 
of said first and second computing elements further 

25 comprises means for generating a quantum interrupt. 

47. A computer system, comprising: 
a first controller; 

a second controller connected to the first 
controller; 

30 a third controller connected to the first and 

second controllers; 
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a first computing element connected to the first 
and second controllers; 

a second computing element connected to the second 
and third controllers; and 
5 a third computing element connected to the first 

and third controllers. 

48. The computer system of claim 47, wherein the 
first controller and the first computing element are 
remotely located from the other controllers and computing 

10 elements to provide disaster tolerance. 

49. The computer system of claim 47, ftirther 
comprising: 

means for intercepting I/O operations by the first 
computing element; 
15 means for transmitting the intercepted I/O from 

the first computing element to the first and second 
controllers; 

means for intercepting I/O operations by the 
second computing element; 
20 means for transmitting the intercepted I/O from 

the second computing element to the second and third 
controllers; 

means for intercepting I/O operations by the third 
computing element; and 
25 means for transmitting the intercepted I/O from 

the third computing element to the first and third 
controllers. 

50. The computer system of claim 47, further 
comprising: 

30 a fourth controller; 

a fifth controller connected to the fourth 
controller; 
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a sixth controller connected to the fourth iand 
fifth controllers; 

t 

a fourth computing element connected to the fourth 
and fifth controllers; 
5 a fifth computing element connected to the fifth 

and sixth controllers; 

a sixth computing element connected to the fourth 
and sixth controllers; cmd 

a communications link for connecting the first, 
10 second, and third controllers to the fourth, fifth, and 
sixth controllers, wherein the first, second, and third 
controllers, and the first, second, and third computing 
elements are in a first location and the fourth, fifth, 
and sixth controllers, and the fourth, fifth, and sixth 
15 computing elements are in a second location. 
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J mume opeiatnmi, barrier synchnmizatioa. 

BOX n. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
\, This ISA found multiple invemiana as foUowa: 

Group I. Claims 1-18, are drawn to controlling processing dementt via timing signals, claasified In ClasB 395 Stibelasa 
550. 

Group n. Claims 19-30 and 36-39 are dmwn to fiuib detection and recovery from £uitts in a processing system, 
ftlassifird in Class 395 Subdaaa 575. 

Group in. Claims 31-35 are drawn to the control and operation of I/O con^ttenta«.clauified in Claas 395 Subelaaa 
275. 

Group IV. Claims 40-50, aie dmwn to the generic combmation and/or interconnections of processing components, 
elaaaified in Class 395 Subdaas 800. 

The four groups describe diiTerent techniques and appanouses which do not share the lame technical features. In 
paitieular. Group I, is limited to a method of controlling or regulating a combination of procff s sing dements according 
10 a periodic sequence of timing pulses. Group II is limited to a method of detecting foults m redinidam dements, 
Inchidmg reeonfiguiation and/or recovery of the system with regard to a fouhy processing dement. Group III is drawn 
is the control and opention of I/O componems, and Group IV is drawn to the generic combination and intereonnectiona 
of prcoessing demenu and thdr peripheral oomponenta. All four thua describe difbrent methods aa rdated to specid 
teehnicd features. 
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